NcPred for Accurate Nuclear Protein Prediction Using n-mer Statistics with Various Classification Algorithms

نویسندگان

  • Md. Saiful Islam
  • Alaol Kabir
  • Kazi Sakib
  • M. Alamgir Hossain
چکیده

Prediction of nuclear proteins is one of the major challenges in genome annotation. A method, NcPred is described, for predicting nuclear proteins with higher accuracy exploiting n-mer statistics with different classification algorithms namely Alternating Decision (AD) Tree, Best First (BF) Tree, Random Tree and Adaptive (Ada) Boost. On BaCello dataset [1], NcPred improves about 20% accuracy with Random Tree and about 10% sensitivity with Ada Boost for Animal proteins compared to existing techniques. It also increases the accuracy of Fungal protein prediction by 20% and recall by 4% with AD Tree. In case of Human protein, the accuracy is improved by about 25% and sensitivity about 10% with BF Tree. Performance analysis of NcPred clearly demonstrates its suitability over the contemporary in-silico nuclear protein classification research.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Prediction of Protein Sub-Mitochondria Locations Using Protein Interaction Networks

Background: Prediction of the protein localization is among the most important issues in the bioinformatics that is used for the prediction of the proteins in the cells and organelles such as mitochondria. In this study, several machine learning algorithms are applied for the prediction of the intracellular protein locations. These algorithms use the features extracted from pro...

متن کامل

A New Hybrid Method for Improving the Performance of Myocardial Infarction Prediction

Abstract Introduction: Myocardial Infarction, also known as heart attack, normally occurs due to such causes as smoking, family history, diabetes, and so on. It is recognized as one of the leading causes of death in the world. Therefore, the present study aimed to evaluate the performance of classification models in order to predict Myocardial Infarction, using a feature selection method tha...

متن کامل

Early Prediction of Gestational Diabetes Using ‎Decision Tree and Artificial Neural Network Algorithms

Introduction: Gestational diabetes is associated with many short-term and long-term complications in mothers and newborns; hence, the detection of its risk factors can contribute to the timely diagnosis and prevention of relevant complications. The present study aimed to design and compare Gestational diabetes mellitus (GDM) prediction models using artificial intelligence algorithms. Materials ...

متن کامل

Accuracy of String Kernels for Protein Sequence Classification

Determining protein sequence similarity is an important task for protein classification and homology detection. Typically this may be done using sequence alignment algorithms, yet fast and accurate alignment-free kernel based classifiers exist. Viewing sequences as a “bag of words”, we test a simple weighted string kernel, investigating the effects of k-mer length, sequence length and choice of...

متن کامل

Automatic classification of highly related Malate Dehydrogenase and L-Lactate Dehydrogenase based on 3D-pattern of active sites

Accurate protein function prediction is an important subject in bioinformatics, especially wheresequentially and structurally similar proteins have different functions. Malate dehydrogenaseand L-lactate dehydrogenase are two evolutionary related enzymes, which exist in a widevariety of organisms. These enzymes are sequentially and structurally similar and sharecommon active site residues, spati...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011